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ADDER-SAVING IMPLEMENTATION OF 
DIGITAL INTERPOLATION/DECIMATION FIR FILTER 

BACKGROUND OF THE INVENTION 
FIELD OF THE INVENTION 

This invention relates generally to digital filters and, more particularly, to a novel 
architecture for a finite impulse response (FIR) filter. 

BRIEF DESCRIPTION OF THE PRIOR ART 

Digital filters are well known in the prior art. Such filters receive sampled digital signals 
and transmit a sampled waveform therethrough. The waveform transmitted by the digital filter is 
determined by coefficients operating on portions of the transmitted digital signal. A typical prior 
art digital filter has a plurality of serially connected delay components with an output of each 
delay component transmitted to a coefficient addition component, the coefficient addition 
component adding the output from the delay component applied thereto by a weighting factor 
derived from a transform function. The outputs of the coefficient addition components are 
applied to the input of a succeeding delay element and eventually provide the filter output signal. 
Accordingly, an input signal, after an appropriate delay, is filtered according to the coefficient 
addition components with the resulting signal being applied to the digital filter output. 

A typical prior art FIR filter is shown in FIGURE 1 which shows the register-level 
implementation diagram for a seven-tap FIR filter which operates in accordance with the 
equation: 
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H(z) = biz" 1 + b 2 z" 2 + b 3 z" 3 + b 4 z" 4 +b 5 z" 5 + b 6 z~ 6 + b 7 z~ 7 
where H(z) = Y(z)/X(z) is the transfer function of the system. For a linear-phase response, the 
coefficients must be symmetric with respect to the center tap with z" 1 representing a register unit 
(such as a D flip-flop) to store the result of the previous calculation. The input data x(n) can be 
interpolated by, for example, 2 before filtering in the present example, though this number is 
arbitrary. Interpolation by M means that M - 1 zeros are inserted between adjacent input 
samples. On the other hand, decimation by M means that every M - 1 input samples are 
dropped. Digital filters which perform these functions are known as interpolation/decimation 
filters. The interpolation/decimation filters are commonly used in modern digital communication 
and audio systems. Assuming that the input data rate is elk, then the clock rate for the FIR filter 
in FIGURE 1 has to be twice elk due to the interpolation at the front or at the filter input. 

The equation for any FIR filter H(z) can be decomposed into Z^HoCz) + Hi(z) where: 

Ho(z) = bi + b 3 z 2 + b 5 z" 4 + b 7 z" 6 
and Hi(z) = b 2 z" 2 + b 4 z" 4 + b 6 z" 6 . 

Therefore, the system of FIGURE 1 can be upgraded into a system as shown in FIGURE 2 which 
includes two serially connected sets of delay elements (e.g. D flip-flops), each delay element 
except the first is preceded by an adder which adds in one of the coefficients. One set of delay 
elements includes a first delay element which receives the input x(n) and multiplies the input by 
the coefficient b 7 at a first delay, the output of which is added to the input multiplied by the 
coefficient b$ and delayed with the output of that delay being added to the input multiplied by the 
coefficient the b 3 and being delayed, the output thereof being added to the input multiplied by 
the coefficient bi and being delayed with the output of the final delay being multiplexed with the 
output of the other set of delay elements to provide the output Yn of the filter. The second set of 



delay elements also receives the input x(n) and multiplies the input by the coefficient b 6 at a first 
delay, the output of which is added to the input multiplied by the coefficient b 4 and delayed with 
the output of the delay being added to the input multiplied by the coefficient b 2 and delayed with 
the output of that delay being added to zero and delayed to the output for multiplexing as 
discussed above. It can be seen that the input x(n) is applied to both sets of delay elements with 
the outputs of the two sets of delay elements being multiplexed to provide output y(n). 

The advantage of the system of FIGURE 2 over that of FIGURE 1 is that the clock rate of 
the fiker is the same as the input data rate elk, which is half of the clock rate of the system of 
FIGURE 1. However, the system of FIGURE 2 has two basic disadvantages, these being (1) that 
the amount of hardware, including adders and registers, is no less than that in the FIGURE 1 
system and (2) that the multiplexer connecting the output of each set of delay elements may 
cause glitches in the final filter output, y(n). More specifically, there are the same number of 
adders and one more register used in the FIGURE 2 system wherein the adder for adding zero is 
not actually needed, but the associated register is necessary in order to maintain the correct 
output. For even-order FIR filters, the number of adders and registers is the same for the 
FIGURE 1 and FIGURE 2 systems. With reference to the multiplexer, in digital design, it is 
desirable to obtain the output from a register rather than from combinational circuits. Placing the 
multiplexer within the register can resolve the glitch issue, however it will introduce more gates 
into the circuits, which is not desirable. 



SUMMARY OF THE INVENTION 

Because the output of the Mth order FIR filter depends upon its previous M input data, 
the number of registers is usually no less than M, However, a number of adders can be saved 
(eliminated) through the resource of reuse, which is an important feature of the present invention. 
Since an N-bit ripple adder takes about the same number of gates as does an N-bit register, 
reducing the number of adders also can reduce quite significantly the silicon area of the 
semiconductor chip required for the filter. 

The second set of delay elements of FIGURE 2 is identical in structure to the first set of 
delay elements. Therefore, in accordance with the present invention, the adders in the first set of 
delay elements are reused or, in other words, these adders are used in conjunction with both the 
first and second set of delay elements. To this end, the filter is operated at twice the clock rate of 
elk. In the first half of the elk period, the output of the first set of delay elements is calculated 
and in the second half of the elk period, the output of the second set of delay elements is 
calculated. The output y(n) is the output of the last register in the first set of delay elements with 
no output multiplexer required. In summary, the adders of the first set of delay elements are 
reused by running the filter at twice the clock rate and computing Ho(z) and Hi(z) on a time- 
sharing basis. 

The system can be further optimized by omitting two redundant registers as shown in 
FIGURE 4. The first delay element in each set of delay elements Qi and Q 2 can be shared, 
thereby removing one of the delay elements. Also, the last delay element in each set of delay 
elements Q 7 and Q 8 can be shared, thereby removing another delay element. Since the delay 
element Q 7 stores the quantity associated with the coefficient b 7 in the first clock phase, in order 
to share Q 7 in the second clock phase, the corresponding coefficient needs to be adjusted by 



subtracting b 7 from b 6 to obtain the correct result. Multiplication of x by bi is actually performed 
by shifts and addition of x in standard manner. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

FIGURE 1 is a diagram of a prior art FIR filter; 

FIGURE 2 is a diagram of an upgraded FIR filter in accordance with the prior art; 
FIGURE 3 is a diagram of an FIR filter in accordance with a first embodiment of the 
present invention; 

FIGURE 4a is a diagram of an FIR filter in accordance with a second embodiment of the 
present invention; 

FIGURE 4b is a block diagram of an FIR filter in accordance with a second embodiment 
of the present invention; and 

FIGURE 5 is a timing diagram for use in conjunction with the FIR filter of FIGURE 4b. 
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DESCRIPTION OF THE PREFERRED EMBODIMENT 

With reference first to FIGURE 3, there is shown a first embodiment in accordance with 
the present invention. In accordance with this embodiment, adders are saved through reuse. As 
stated above, since an N-bit ripple adder takes about the same number of gates a does an N-bit 
register, reducing the number of adders can also reduce the silicon area of the semiconductor 
chip required for the components quite significantly. Since the second set of delay elements of 
FIGURE 2 is identical in structure to the first set of delay elements, the reuse of adders in the 
first set of delay elements can be effectuated. To this end, the filter is operated at twice the clock 
rate of elk. In the first half of the elk period, the output of the first set of delay elements Q 7 , Q5, 
Q3 and Qi is calculated and in the second half of the elk period, the output of the second set of 
delay elements Q 8 , Q 6 , Q4 and Q2 is calculated as shown in FIGURE 3. The output y(n) is the 
output of the last register in the first set of delay elements with no output multiplexer required. 
The adders of the first set of delay elements are reused by running the filter at twice the clock 
rate and computing H 0 (z) and Hi(z) on a time-sharing basis using only the adders shown as 
coupled to the delay elements Q 7 , Q5, Q 3 and Qi on a multiplexed basis. 

The system of FIGURE 3 can be further optimized to omit two redundant registers. In 
this regard, delay elements Qi and Q2 can be shared, thereby removing delay element Q2. Also 
delay element Q 8 can be shared with delay element Q 7 , thereby removing delay element Q 7 . 
Since delay element Q 7 stores b 7 x in the first clock phase, it has to be adjusted by subtracting b 7 x 
in order to obtain the correct value in the second clock phase. This system is shown in FIGURE 
4. Multiplication of x by bi is actually performed by shifts and addition of x. 

The example as set forth in FIGURE 4 is an interpolation filter with interpolation factor 
of two, it being understood that the invention herein can be applied to a decimation filter as well. 



A decimation filter is similar to an interpolation filter in its implementation except that, with 
reference to FIGURE 2, the multiplex output turns into the input multiplexer in the front for the 
decimation filter. This invention can also be applied to filters with interpolation/decimation 
factor other than two, the number two being used herein only by way of example. 

The silicon area saved by the present invention for an Lth order filter with decimation 
factor M can be estimated. The major hardware for the filter is assumed to be the adders and the 
registers. It is also assumed that an N-bit adder has the same gate count as an N-bit register. The 
total number of registers required is NL, and the adders equivalently occupy NL register areas. 
Therefore, the total number of areas required for the system of FIGURE 1 is 2 NL. In contrast, 
there are only N/M adders in the system of FIGURE 4b, therefore the total number of areas 
required is (1 + 1/M)NL for the system of FIGURE 4b. For the foregoing example, with M = 2, 
the required chip area is reduced by 25 percent by this estimation. Greater chip area reduction 
can be achieved with a higher interpolation/decimation factor M. A 7th order filter with M = 2 
can be synthesized with a 24 percent reduction in gate count. There are slightly more gates 
required than the estimation, mainly due to the small number of multiplexers required for the 
coefficients as shown in FIGURE 4b. 

With reference to FIGURE 4b, the input x is applied to each of a first multiplier where it 
is multiplied by the coefficient b 7 , a second multiplier where input x is multiplied by result of 
coefficients b 7 - b 6 , a third multiplier where input x is multiplied by coefficient b 5 , a fourth 
multiplier where input x is multiplied by coefficient b 4 , a fifth multiplier where input x is 
multiplied by coefficient b 3 , a sixth multiplier where input x is multiplied by a coefficient b 2 and 
a seventh multiplier where input x is multiplied by a coefficient bi. The b 7 product is delayed by 
D flip-flop Q 7) which is clocked at a clock rate of clk/2 and supplied to a first adder. The b<> - b 7 
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product and the bs product are multiplexd by the clock elk, the output of which is added to the 

output of flip flop Q7 in the first adder to provide an output Si. This output is delayed in D flip 

flops and Q5 which are also clocked at the clock rate clk/2 with the output applied to a second 

adder. The b4 product and b3 product are multiplexd by the elk, the output of which is applied to 

the second adder. The sum of the two signal applied to the second adder is the output S2 which 

is applied to D flip-flops Q4 and Q3 which are also clocked at a clock rate of clk/2 with the output 

of these flip-flops being applied to a third adder. The b2 and bi products are multiplexed by the 

elk, the output of which is also applied to the third adder and added to provide the output S3. This 

output is delayed by D flip-flop Q2 which is clocked at the clock rate clk/2 to provide the output 

y. The timing diagram for the circuit of FIGURE 4b is shown in FIGURE 5 where all registers 

are clocked by clk/2 while all multiplexers are controlled by elk which operates at one half the 

speed of clk/2. The following equations are true for the embodiment of FIGURE 5: 

At phase 0: Sl(i) = b 7 x(i) + (b 6 - b 7 ) x(i) = b 6 x(i) 

S2(i) = Sl(i-2) + b 4 x(i) = b 6 x(i-2) + b 4 x(i) 

S3(i) = S2(i-2) +b 2 x(i) = b 6 x(i-4) + b 4 x(i-2) +b 2 x(i) 

At phase 1 : S 1 (i) = b 7 x(i-2) + b 5 x(i) 

S2(i) = Sl(i-2) + b 3 x(i) = b 7 x(i-4) + b 5 x(i-2) + b 3 x(i) 

S3(i) = S2(i-2) + bix(i) = b 7 x(i-6) + b 5 x(i-4) +b 3 x(i-2) + bix(i) 

where i is the time index and i-n means n time clocks later. In phase 0, the result of the 
upper half branch in FIGURE 4b is obtained and in phase 1 the result of the lower half branch in 
FIGURE 4b is obtained. 

It can be seen that the function of the circuit of FIGURE 2 is provided in FIGURES 4 and 
5 without four of the adders, two of the flip-flops and without the multiplexer. Accordingly, a 
significant amount of chip area is saved for other uses. 



Though the invention has been described with reference to a specific preferred 
embodiment thereof, many variations and modifications will immediately become apparent to 
those skilled in the art. It is therefore the intention that the appended claims be interpreted as 
broadly as possible in view of the prior art to include all such variations and modifications. 
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